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ECHO USER’S GUIDE 


INTRODUCTION 

The ECHO* classification functions are designed to 
identify objects in multispectral data, gather the statistics 
of the identified objects, and where possible, to classify 
the data on an object-by-object basis. 

The motivation for this approach to classification is to 
include spatial as well as spectral information in the class- 
fication algorithm and thereby increase the classification 
accuracy. One by-product of one ECHO implementation is that 
ECHO classifications require less CPU time than the standard 
point-by-point classifier. 

Point-by-point classifiers, such as the LARSYS CLASSIFY- 
POINTS function, compare spectral measurements from each fea- 
ture of each point to class statistics, computing a likelihood 
or discriminant function associated with each class, and cate- 
gorizing the point as belonging to the class with the largest 
discriminant function value. Each point is classified 
individually, on the basis of spectral measurement alone. One 
premise of this technique is that the objects of interest are 
large in comparison to the size of the point. If this were not 
so, a large portion of points would be composites of several 
classes, making statistical pattern classification unreliable 
since pre-specif ied categories would be inadequate to describe 


ECHO stands for Extraction and Classification of Homogeneous 
Objects 
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actual states of nature. From this premise it follows that 
objects are represented by arrays of point, and that a statis- 
tical dependence exists between consecutive points. Point-by- 
point classifiers fail to exploit the statistical dependence 
between adjacent points when assigning classes. 

The ECHO processors benefit from spatial information by 
aggregating into groups points whose spectral responses are not 
significantly different in a statistical sense, and then applying 
a maximum likelihood classification rule to these homogeneous 
groups. Homogeneous objects are identified in a three step 
process. First, cells are formed by systematically partitioning 
the data into N by N sized blocks of pixels. The statistics 
of each cell are then compared to a homogeneity threshold. 

Points which ^o not comprise homogeneous cells (that is, con- 
stituent points of cells not meeting the homogeneity criterion) 
are classified on a point-by-point basis, just as contemporary 
classifiers catagorize all points. Statistics of adjoining 
homogeneous cells are then compared. Adjoining cells which 
appear to belong to the same statistical population on the basis 
of user-supplied annexation thresholds are combined into a 
single object and sample classified. To perform both the sample 
and the point-by-point classifications, Gaussian (or multi- 
variate normal) class distributions (class mean and covariance 
matrices) are required. A flow diagram of this process is 
presented in Figure 1. 

Two separate ECHO approaches have been developed. The 
first, supervised ECHO, makes use of pre-specif ied multivariate 
normal class distributions to identify homogeneous objects. 
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The other, nonsupervised ECHO, identifies objects without the 
use of class statistics. Both processors require pre-specif ied 
class statistics (class mean and covariance matrices) to classify 
those objects identified. Objects identified by the nonsuper- 
vised field extraction algorithm (without the benefit of class 
statistics) may be used as an aid in obtaining the class 
statistics needed for the classification phase of the algorithm. 

The two succeeding sections of this User's Guide deal with 
the supervised and nonsupervised algorithms, respectively. The 
supervised processor tends to be somewhat more accurate than 
the nonsupervised processor due to the use of the class statis- 
tics in the identification of homogeneous objects. On the other 
hand, since the nonsupervised processor does not require class 
statistics for object identification, the object map which it 
produces can be used to aid in developing the class training 
statistics * 

Additional background information on ECHO may be found in 
the LARS Final Report to JSC in May 1975 [l], R. L. Kettig's 
doctoral thesis [2], a LARS Information Note [3] , symposium 
proceedings [4] , the LARS Final Report to JSC in May 19 77 [5], 
and the LARS Final Report to JSC in November 1977 [6]. 
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SUPERVISED ECHO CLASSIFIER (SECHO) 

Input to the function is: 

' Data from a Multispectral Image Storage Tape, 

' Control cards to select processing and output options, 

* A statistics file containing the statistical descrip- 
tion of the training classes, 

* A data deck containing Field Description Cards to 
identify the area or areas to be classified. 

The user has a wide range of control over the actual para- 
meters used when processing data. He may elect to produce in 
either a one or two phase approach a Classification Results 
File, which may be placed either on tape or on disk. When the 
two phase approach is selected, the data is partitioned into 
N by N cells of user specified size, statistics are gathered 
for the cells, and those cells whose statistics do not pass the 
user-specified homogeneity criterion are identified. This cell 
processing information is then written to an Intermediate Re- 
sults Tape. The second phase of the two phase approach utilizes 
the Intermediate Results Tape and the user-specified annexation 
criteria to produce the Classification Results File. The 
advantage of the two phase approach is that it allows the user 
to produce results utilizing different cell-to-cell annexation 
parameters without needing to repeat the expensive process of 
gathering cell statistics each time. When running the supervised 
ECHO processor in a single phase approach, all processing listed 
above is accomplished without the need of an Intermediate 
Results Tape. 
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Although the Intermediate Results File has the same basic 
format as the Classification Results File, it is used only for 
storing information produced from the cell processing phase 
(where cell refers to a N by N sized block of data points) . 

This file is used as input to the cell annexation phase which 
joins cells with similar characteristics and produces classifi- 
cation results. 

Note; The Intermediate Results File produced by super- 
vised ECHO processor is not compatible with the Intermediate 
Results File produced by the nonsupervised ECHO processor. 

The Intermediate Results Files generated by the two ECHO imple- 
mentations should not be stored on the same tape. 

The Classification Results File is normally used as input 
to the PRINTRESULTS function to produce a variety of printed 
output for the evaluation of the classification. It is also 
the primary input to the COPYRESULTS , LISTRESOLTS, and PUNCH- 
STATISTICS functions. The file must La stored on tape for 
use by the latter two LARSYS functions. 

SECHO produces four standard and three optional printer 
output products. Standard printer outputs include a control 
card listing, a list of the channels considered, a list of 
classes to be used, and an identification header listing 
characteristics of the run. The optional printer outputs are 
statistical summaries for the classes considered, a singular 
cell map, and a classification summary map. Only one of the 
latter two map outputs may be requested for a single execution 
of the processor. More detailed descriptions of these outputs 
appear later. 
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Inputs 

The supervised ECHO classifier, as mentioned above, con- 
sists of two main parts: (1) the cell processing phase, carried 

out first, in which cell statistics are gathered and the 
screening of nonhomogeneous (singular) cells is performed, and 
(2) the cell annexation phase, where the cell information is 
used to join or annex neighboring cells with sufficiently similar 
spectral characteristics into fields (or groups of cells) and 
classify each entire field. These processing steps can be con- 
ducted either sequentially in a single execution of the pro- 
cessor or independently in two separate SECHO executions . 
Consequently, the input data required for each step of processing 
will be discussed separately. 

Cell Processing Phase 

The initial cell processing phase requires input of 
control cards, Field Description Cards for the areas to be 
classified, a Statistics Deck for training the classifier and 
for object identification, and the Multispectral Image Storage 
Tape. The supervised ECHO processor uses the identification 
information on the LARSYS Field Description Cards, along with 
the System or User Runtable File to identify and request the 
appropriate Multispectral Image Storage Tape. The format of 
the Multispectral Image Storage Data File and the LARSYS 
Runtable File can be found in the LARSYS System Manual [7] . 

Input statistics must be placed in the Statistics File 
before being used by the supervised ECHO classifier. A Statis- 
tics File is made available to the ECHO classifier either by 
executing one of the LARSYS functions that uses the statistics 
information or by including the statistics information in the 
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control card file. Any of LARSYS functions CLASSIFYPOINTS, 
STATISTICS, SEPARABILITY, CLUSTER, or SAMPLECLASSIFY may be 
used to transfer the statistics into the Statistics File. 

The 'STATDECK USE' command may also be issued to transfer 
to the supervised ECHO processor a previously saved Statistics 
File. 

If the user chooses to include the statistics in his 
supervised ECHO input deck, he must also include a 'CARDS 
READSTATS * control card in the deck. The statistics card deck 
is inserted into the input deck as the first group of data 
cards, preceding the Field Description Cards which describe 
the areas to be classified. Otherwise, the Statistics File 
is assumed to reside on the user’s Temporary Disk. 

Several control card parameters are required by the cell 
processing phase. The channel numbers of the data to be pro- 
cessed are required; the cell width (number of data points on 
each side of a square cell) must be declared; the cell homo- 
geneity threshold (for differentiating homogeneous cells from 
singular cells) must be specified; optional selection of a sub- 
set of the training classes represented in the Statistics File 
may be specified; and declaration of the areas to be classified 
must be made. 

Another required input is the destination of the results. 

As has been pointed out, the cell processing phase and the cell 
annexation phase may be carried out either jointly, in a single 
execution of SECHO, or independently, in two separate executions 
of SECHO. When the two phases are to be executed independently, 
an Intermediate Results File must be specified as the destination 
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of the cell processing output. When the cell processing phase 
and the cell annexation phase are to be run jointly in a single 
execution of the processor, a destination for the final results 
must be included. The Classification Results File may either 
be placed on disk or on a Results Tape. 

An example control card deck for executing the cell pro- 
cessing phase (phase 1) of the supervised ECHO processor is 
presented in Figure 2. 

Cexl Processing and Annexation 

When all processing is to be accomplished in one step, 

(both phases run in a single execution) only the annexation 
threshold and final results location need to be added to the 
information required by the cell processing phase. When the 
'INTERMEDIATE TAPE' control card in Figure 2 is replaced by a 
'RESULTS' control card and an 'ANNEXATION' control card is 
added, cell processing and annexation occur in one step and 
a Classification Results File is produced. Figure 3 is an 
example of the control cards necessary for the execution of 
both the cell processing (phase 1) and the cell annexation 
(phase 2) algorithms in a single step. Note: No 'INTERMEDIATE' 

control card may be used when single step processing is desired. 
Cell Annexation Phase 

When independent execution of the cell annexation phase 
(phase 2) is desired, the 'INTERMEDIATE' control card is re- 
quired to specify input from the Intermediate Tape File, produced 
by the previously executed cell processing phase (phase 1) . An 
'OPTIONS INTERMEDIATE’ control card must appear in the card 
deck to indicate that only the cell annexation algorithm is 
desired. In addition, a Classification Results File destination 
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SECHO Cell Processing Phase 
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Example Control Cards for Joint Execution of Both Phases of SECHO 
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must be specified. All cell width, channel calibration, and 
optional selection of training classes information is extracted 
from the Intermediate Tape and need not be respecified. Figure 
4 is an example of the control cards necessary to complete an 
ECHO classification. Execution of the control cards in Figure 
2 would have supplied the Intermediate Results Tape which con- 
tains the cell processing input for the annexation phase. 

Specification of Channels : The multispectral data channels 

to be used by the supervised ECHO classifier must be specified 
by including the CHANNELS control card. This control card 
must appear whenever the cell processing is to be performed 
(either for execution of the cell processing phase or for 
joint execution of both ECHO phases) . The user specifies 
channels in this manner: 

CHANNELS I, J. . . 

where I, J, . . . are the channel numbers to be used. Appendix 
IV of the LARSYS User's Manual [8] contains information on how 
this card may also be used to calibrate data from the Multispec- 
tral Image Storage Tape. 

Optional Selection of Training Classes : The user may select the 

training classes from the Statistics File that are are to be used 
by supervised ECHO'S cell processing phase (phase 1) , and he may 
combine training classes into pools. These options are exer- 
cised by using the 'CLASSES' control card. For example, if the 
user wishes to use only classes 1, 3, and 5 of seven training 
classes previously defined by the Statistics function the con- 


trol card entry would be: 
CLASSES 1, 3, 5 
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Figure 4 

Example Control Cards for Execution of the Annexation Phase 

of the SECHO Processor 
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In this case, the class name assigned by the statistics func- 
tion at classes 1, 3, and 5 will be retained by SECHO and the 
other classes will be totally ignored. 

To combine two or more classes into one class, the user 
assigns a name (up to eight characters) to the pooled class 
to be created and specifies the classes to be included in the 
pooled class. For example, assume there are eight classes 
available in the training statistics, and the user wishes to 
process the following combinations: 

POOLA (Pool A) will be classes 1 and 2. 

POOLB will be classes 4, 6, and 7. 

POOLC will be class 5 only. 

Classes 3 and 8 will be ignored. 

The control card format to specify this option will be: 

CLASSES POOLA (1/1, 2/) , POOLB ( 2/4 , 6 , 7/) , POOLC ( 3/5/) 

Note that the number immediately following a left parenthesis 
specifies the pool sequence. Pool sequence numbers must be 
in ascending order. Note also that the classes to be pooled 
(and named) are enclosed by slashes (/) . 

When no 'CLASSES' card is specified, all the classes in 
the statistics deck will be considered by the supervised ECHO 
processor both object identification and for classification. 

Specification of Annexation Parameter : The annexation parameter 

is required for execution of the SECHO processor when the two ECHO 
phases are to be run jointly or when the cell annexation phase 
is to be run. The form of this card is: 

ANNEXATION THRESHOLD (X.X) 

oaiG INAL 
fvt? POOft 


PA.QE & 
quality 


where X.X is a floating point threshold for the generalized 
likelihood ratio criterion for annexing to fields adjoining 
homogeneous cells. The higher the annexation threshold, the more 
likely it is that annexation will occur. 

Specification of Cell Parameters : The cell width and homogeneity 

parameters are required by the SECHO processor for execution of 
the cell processing phase or joint execution of both SECHO phases 
These parameters are specified with a control card of the form: 
CELL WIDTH (N) , HOMOGENEITY ( XX . X) 

The width parameter represents the ''width" of the cell in pixels. 

2 

Each cell is made up on N pixels of N columns and N lines. The 
homogeneity parameter is used as a threshold for differentiating 
homogeneous cells from singular (non-homogeneous) cells. As the 
homogeneity parameter increases, the likelihood that a cell will 
be identified as homogeneous increases. 

Specification of Areas to be Classified : The user must provide 

the cell processing phase (phase 1) of the supervised ECHO 
processor with Field Description Cards to define the area 
or areas to be classified. These are included in the input 
deck following a DATA CARD. Either of two forms of this card 
may be used. The formats are described in the Control Card 
Dictionary for CLASS IFYPOINTS in appendix I of LARSYS User’s 
Manual jj} j . These Field Description Cards identify the speci- 
fic portion of data from the Multispectral Image Storage 
Tape that is to be classified. The information is used by 
the processor to request the appropriate tapes and access 
the desired segment (s) of the specified data run. 
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Qutputs 

Cl assification Results File ; The principle output of the 
Supervised ECHO Classifier is the Classification Results File, 
which, in turn, is the primary input to other LARSYS functions: 
PRINT RESULTS , COPYRESULTS, LISTRESULTS, and PUNCHSTATISTICS. 


The location of this file must be specified when either the 
single-step (phase 1 and 2 executed jointly) or the cel]-to- 
cell annexation phase are to be executed. The location of 
this file is not specified when only the cell processing phase 
is to be executed. The file may reside on either tape or 
disk, and the user must specify one or the other on a RESULTS 
control card. However, if the user wishes to save the results 
file, or if he wishes to use it as input to the LARSYS LIST- 
RESULTS or PUNCHSTATISTICS functions, he must place it on tape 
or have it copied to tape by the COPYRESULTS function. 

The user specifies where the Results File will reside by 
using a RESULTS control card in one of three forms: 


RESULTS TAPE (xxx) , FILE (nn) 
RESULTS INITIALIZE, TAPE (xxx) 
RESULTS DISK 
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The first control card is used to add the file to a tape 
already containing classification Results Files. If a file 
of the specified number already exists on the tape, the user 
will be notified by a message. He then has the option of 
writing over the old file, specifying a new tape and file num- 
ber, or stopping execution. 

The second 'RESULTS' control card example specifies that 
a new results tape is to be used, and the 'INITIALIZE' parameter 


requests that the proper header information be placed at the 
beginning of the new tape. A tape must always be initialized 
before it can be used to store classification results. 

Execution of the third 'RESULTS’ control card would cause 
the Classification Results File to be written on the disk. 

When the Classification Results File is placed on disk, it is 
only stored there temporarily. If the user wishes to save the 
file, he must copy it to tape with the COPYRESULTS function. 

Any of the actions listed below will cause the Classification 
Results File to be erased from the disk by the system: 

•Another execution of a classification function. 
•Re-initiation LARSYS, i.e. , issuing the 'I LARSYS ' 
control command. 

•Logging off the system, i.e., issuing the 'QUIT' control 
command . 

A unique "classification study number", based on the date 
and time of the run, is part of each Results File. The number, 

identified as "classification study”, is included on any 

* 

outputs that are subsequently derived from the results file. 

The form of the identification number is "ydddsssss" ; where 
y is the last digit of year, ddd is the Julian date (day of the 
year 001-365) , and sssss is the total number of seconds since 
the previous midnight. 

The principal data on the Classification File is the 
categorization of each input point made during the classification 
run. A separate record is written for each line of the 
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classification. This record contains, for each point in the 
line, the class number associated with the class to which the 
point was assigned. The likelihood code, which is set by the 
LARSYS per point classifier, is not assigned a value by the 
SECHO processor. The classification results are used by the 
PRINTRESULTS function to produce detailed maps of the classi- 
fied area as well as tables of the test fields, training fields, 
and class performance. For more information on these products, 
refer to the description of the PRINTRESULTS function in the 
LARSYS User’s Manual £8]]. 

In addition to the classification results, the file con- 
tains other data related to the Classification run: 

A complete copy of the Statistics File that was used 
as input to the run. This file may be punched on 
cards by using the PUNCHSTATISTICS function. 

Summary information about the classification and the 
channels and classes that were used. A formatted 
listing of this information may be produced by using 
the LISTRESULTS function. This listing is also a 
secondary product of both the PUNCHSTATISTICS and 
the COPYRESULTS function. 

’ Reduced satellite (mean vectors and covariance matrices) 
for the classes and channels used in the classification. 


Interm ediate Results File : A secondary output is the Inter- 

mediate Results File, used only when cell processing and cell 
annexation are to be performed independently by two separate 
executions of the SECHO processor. The same control cards 


Vv 
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are used for specifying the Intermediate Results location as 
for specifying the classification results location except 
the card is labeled 1 INTERMEDIATE * rather than 'RESULTS' and 
the 'DISK' is not a valid location. A tape file must be used 
for Intermediate Results storage. The format of the Inter- 
mediate Results File is similar to that of the Classification 
Results File. The class catagorizations and associated pro- 
babilities which appear for each line of input data in the 
Classification Results File (see LARSYS SYSTEMS MANUAL [7] ) are 
replaced, in the supervised ECHO Intermediate Results File, 
by the class numbers and cell likelihood values for each row 
of N by N point cells. When processing is to be carried out 
in two independent phases, the 'INTERMEDIATE' card must appear 
in the control card decks of both the cell processing and the 
cell annexation phase. The 'INTERMEDIATE' card identifies the 
destination of the principal results of the cell processing 
phase when that phase is executed independently. It identifies 
the location of the principal input when the cell annexation 
phase is executed. 

Standard Printer Output ; The supervised ECHO classifier always 
prints a summary of the user's input deck. The summary includes 
a reproduction of the input deck control cards, a list of 
options the user has selected, and particular characteristics 
about the run, such as the number of class and channels used, 
the channel numbers, etc. An example of this output is shown 
in Figure 5. 

In this case, the 'CARDS READSTATS ' option indicates that 
the Statistics Deck specifying the mean and covariance matrices 


Figure 5* SECHO Summary Information 
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of the training classes appears as part of the control card 
deck. The 'PRINT SINGULAR' causes a Singular Cell Map to 
be generated. The absence of an 'INTERMEDIATE' card indicates 
that both the cell processing and the cell-to-coll annexation 
phases are to be executed. 

Several items listed under "SUPERVISED ECHO CLASSIFIER 
INFORMATION" in Figure 5 are of particular interest. The 
list is always headed by the Classification Study Number (the 
unique identification number for the particular classification) . 
The number of fields used to generate the statistics for the 
classifier are given next. Note that in this case 30 fields 
were used to generate the input Statistics File. 

The last item in the list, ("CHANNELS, SELECTED ARE...") 
identifies the channels that will be used in the classification. 
If the user had included a CHANNELS control card in his input 
deck, the channels that were specified there would be listed. 

There are three other standard printer outputs. They are: 

1. A Classes and Channels Table . This shows the class 
name for each of the training classes (as defined 
in the Supervised ECHO Classifier input deck) and 
the channel number, spectral band, and calibration 
code for each channel (taken from the Statistics File) . 
A sample is shown in the attached Figure 6. 

2. A Processing Parameters List . Figure 6 also contains 
a list of the processing parameters. The cell width, 
the annexation threshold, and the cell homogeneity 
threshold are a recapitulation of control card inputs, 
the number of channels and the number of pools results 
from the information contained in the Statistics Deck. 


Figure 6. Example SECHO Classes and Channels Table and Processing Parameter List 
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PROCESSING PARAMETERS 
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CELL HOMOGENEITY THRESHOLD = 2.7B00E 01 

NUMBER OF CELL LINES IN BUFFER = 113 
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These two parameters may be modified by the 'CLASSES' 
and 'CHANNELS' control cards. The number of cell 
lines which the program will hold based on the other 
input requirements (classes, length of line, size of 
cell) is specified. This value must be at least 2. 

3. A Classification Run Identification Table . This 
table shows the run information obtained from the 
input tape ID record, the spectral band and calibra- 
tion code for each channel, and the coordinates for 
the area to be classified. If a map is requested, 
this table will be printed as a header for the map. 

An example of this table is above the example printer 
map (Figure 7) which appears in the description of 
optional printer output. 

Optional Printer Output : Three optional printer outputs may 

be selected with the PRINT control card: 

1. Statistics Summary. This output is produced for 
each of the classes (or pooled classes) used in the 
classification. Its form and content is the same as 
that produced in the LARSYS STATISTICS function, 
except that it covers only the actual channels that 
are to be used in the classification. It shows, for 
each of the classes, the mean and the standard deviation 
of the response for each channel of data, and a corre- 
lation matrix of channels. 

2. A Pictorial Classification Map . This map, generated 
during the cell annexation phase of SECHO, is an image 
of the entire classified area, with each point 

ORIGINAL PAGE IS 
Of POOR QUALITY 
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represented by an alphanumeric symbol ( a number, 
character or special symbol) . Figure 7 presents the 
classification map which results from the control 
card input presented in Figure 5. Note the standard 
Run Identification Output appears as a header to the 
Classification Map. The symbol that is used to re- 
present each class on the map is recorded on the 
Classes and Channels Listing. These symbols default 
to assignment to each class {or pooled class) based 
solely on the class number. Default assignments are 
as follows: 

Class Number Symbol 

1 through 9 numbers 1 through 9 

10 through 35 characters A through Z 

36 number 0 

37 through 44 symbols +,=,*,$,/,&, C, and) 

45 through 53 numbers 1 through 9 

54 through 60 characters A through G 

Alternatively, the user may specify symbols assign- 
ments by use of a ’SYMBOLS' control card. For example: 
SYMBOLS A,A,A,B,W,A, 

would cause the first, second, third, and sixth classes 
to be represented by an A on the classification map, 
the fourth class by a B and the fifth class by a W. 

More comprehensive and flexible mapping capabilities 
are available through the LARSYS PRINTRESULTS function. 
The reader should refer to the description of that 
function in the LARSYS User ' s Manual jjjj for an example 
PRINTRESULTS Output. 
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7. Pictorial Classification Map. 
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The user may use the PRINT control card to request 
either or both outputs discussed to this point. A 
'PRINT STATS' card will print only the statistics 
summary, a 'PRINT CLASSIFICATION' card will print 
only a map, and a 'PRINT STATS, CLASSIFICATION' card 
will print both of them. 

3. Singular Cell Map . This map is obtained from the cell 
processing phase of the Supervised ECHO Classifier 
function. Figure 8 is a Singular Cell Map of the same 
area as that represented on the Pictorial Classifica- 
tion Map in Figure 7. By applying the cell selection 
threshold supplied in the input control cards, non- 
homogeneous cells are detected and screened out. The 
singular cell map places a symbol ('0') at the coor- 
dinator of each singular cell. Note that a character 
on this map represents a cell of data, not a single 
point. Hence, in Figure 8, since the cells are two 
by two sized blocks of pixels, line and column headers 
are incremented by two. This map is useful in detecting 
a very non-homogeneous area, too high a value for the 
cell selection parameter, or classes missing in the 
statistics information. 

Large groups of contiguous singular cells will 
occur when one or more spectral classes have been 
omitted. For example, in Figure 8 there is a large 
group of singular cells between lines 300 to 322 
and columns 424-448. Part of a reservoir is contained 
in this area. Though water is a class contained in 


Figure 8. Example SECBIO Singular Cell Map 
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the statistics deck for this run, the statistics for 
the water class were gathered over a river rather 
than the reservoir. Statistics of the water in the 
reservoir are different enough from the statistics 
of the water in the river for these cells to appear 
unrecognizable, and hence, to be identified as 
singular. When the cell homogeneity parameter is 
very high, no cells will be identified as singular. 
Unless the analyst desires all cells to be classified 
as small samples, a cell map with few symbols indi- 
cates the homogeneity parameter is too high. 

Only one map can be produced by a single execution 
of supervised ECHO classifier. Either a Classification 
Map or the Singular Cell Map may optionally be produced, 
but not both. 


NONSUPERVISED ECHO: FIELD EXTRACTION - PHASE 1 


(NSlECHO) 

The NS1ECHO function is an implementation of the field 
extraction phase of the nonsupervised Extraction and Classi- 
fication of Homogeneous Objects (ECHO) algorithm. It partitions 
the date into N by N sized cells of pixels , performs cell-to- 
cell annexation to form fields, computes statistics of these 
fields, and saves the results on an Intermediate Tape. In 
addition, this function creates an object map by replacing 
the data vectors of those pixels identified as falling within 
a field with a data vector of the channel-means of the field. 

The program flags those cells which it identifies as "singular" 
(containing pixels from more than one class). Information is 
stored on the Intermediate Results Tape to be later used as 
input to the nonsupervised ECHO Classification Phase (NS2ECH0 
function) . 

Input to the function is: 

* Data from the Multispectral Image Storage Tape 

* Control cards to select the processing and output options. 

* A data deck containing a Field Description card to identify 
the area to be processed. 

The user has a great deal of control over the data to be pro- 
cessed by means of the control cards. The results are placed 
on an Intermediate Results Tape for later processing by the 
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nonsupervised ECHO Classif icatin Phase. Mote; The format of 
nonsupervised Intermediate Results File is not compatible with 
the supervised ECHO Intermediate File format . Intermediate 
results generated by the nonsupervised ECHO processor should be 
kept on a separate tape from intermediate results produced by 
the supervised ECHO processor. NS1ECHO produces, besides general 
information about the ECHO run, an optional field map. A 
detailed description of how this map is requested appears below. 

Inputs 

The main input to the NS1ECHO function is the Multispectral 
Image Storage Tape. The function will use the identification 
information on the Field Description Cards, along with the 
system {or user) Runtable, to identify the appropriate input 
tape and have it mounted. The content and form of this primary 
LARSYS input file is described in Appendix IV of the LARSYS 
System Manual [ 7 ] . 

In addition to the principal input, the user is expected 
to provide an input deck which further defines the data to be 
used, the processing parameters, and the input/output options. 

More specifically, he employs control cards to designate the 
channels to be used, the annexation, cell selection and 
cell width parameters, and the intermediate tape, file, and 
run number. He also must provide a data card {a LARSYS 
Field Description Card) which specifies the area to be processed. 

The sample input deck shown in Figure 9 illustrates the 
use of these inputs. The discussion that follows provides details 
about the specifications of these inputs. 
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Figure 9 

EXclmple Control Cards for NS1ECHO Processor 
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Specification of Channels : The channels to be used by the 


NSlECHO function must be specified in a CHANNELS card. The 
form is: 

CHANNELS I, J, . . . 

where I, J, . . . are the channel numbers of the channels to 
be used. An example of the use of this card is shown in 
Section 3.1 of the LARSYS User's Manual [8]. Appendix IV of 
the User's Manual contains information on how this card may also 
be used to calibrate data from the Multispectral Image Storage 
Tape. 

Specifications of Annexation Parameters : The annexation para- 

meters are required and must be specified on an ANNEXATION card. 
The form of this card is: 

ANNEXATION MEAN (X.XX) , VARIANCE (Y.YY) 
where X.XX and Y.YY are floating point numbers and represent 
annexation thresholds for the mean and for the covariance 
matrix respectively. They must be one of the following values: 

.1, .05, .025, .01, .005, .001. These parameters are used as 
thresholds in comparisons between adjacent homogeneous cells. 

A cell is annexed to a field if it pass both the mean threshold 
test and the covariance threshold test. As the annexation 
thresholds become smaller, the likelihood of annexation increases. 

Specification of Cell Parameters : The cell width and homogeneity 

parameters are required and are supplied by means of a CELL 
card. The form of this card is: 

CELL WIDTH (N) , HOMOGENEITY {Y.YY, Z.ZZ, .'. 
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The width parameter represents the "width" of a cell in pixels. 

. 2 

Each cell is made up on N pixels of N columns and N lines. 

If not specified, the cell width defaults to 2. The cell homo- 
geneity parameter is a threshold for the cell mean and variance. 

If the variance divided by the mean of the cell is greater than 
the homogeneity threshold for any selected channel, the cell is 
split and each constituent pixel classified separately. The 
cell homogeneity threshold can be any value. As the homogeneity 
parameter increases, the likelihood that a cell will be iden- 
tified a 'singular' and its pixels classified individually de- 
creases. If only one homogeneity parameter is specified, it 
will be applied to the ratio of cell variance to cell mean for 
each requested channel. When two or more homogeneity parameters 
are specified, the first threshold will correspond to the first 
channel selected, the second threshold to the second selected 
channel, and so on. When more thresholds than channels are 
specified, the trailing thresholds are ignored; when more channels 
than thresholds are requested, the last specified threshold will 
be used for the trailing channels. 

Specification of Areas to be Classified : The user must provide 

a Field Description Card to define the portion of the selected 
LARSYS run that the field extraction phase of nonsupervised 
ECHO is to process. This card follows the 'DATA' card xn the 
input card deck. Either of two forms of this card may be used. 

The formats are described in the control card dictionary for 
CLASS IFYPOINTS in Appendix I of the LARSYS User's Manual [8]. 

The Field Description Card identifies the specific portion 
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of data from the Multispectral Image Storage Tape that is to be 
used. The information is used by the processor to request the 
appropriate tapes and position those tapes so as to access the 
requested lines and columns of the specified runs. 

Optional Specification of Field Map ; The user may request to 
have a map printed showing the annexation of cells into fields 
as well as singular cells. It is specified by the following 
card: 

PRINT MAP 

In addition, this option sets up the intermediate tape for con- 
version to a map tape output by replacing individual pixel 
value by the mean value of the field which the pixel is asso- 
ciated with. This option will cause computer time to increase, 
so it should be used only when an object (field) map is desired. 

Outputs 

Intermediate Results File : The principal output of the NSlECHO 

function is the Intermediate Results File, which is, in turn 
the primary input to the NS2ECH0 function. The file must reside 
on tape which is specified by the user on the INTERMEDIATE card. 
The nonsupervised Intermediate Results File is not compatible 
with and may not reside on a LARSYS Classification Results Tape . 

The user must specify where the file is to be stored by 
using an ' INTERMEDIATE * control card in one of two forms: 
INTERMEDIATE NEWRUN (XXXXXXXX) , TAPE (YYYY) , FILE(ZZ) 
INTERMEDIATE NEWRUN (XXXXXXXX) , TAPE (YYYY) , INITIALIZE 
The first control card is used to place the file on a tape 
already containing Intermediate Files. If a file of the 
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specified number already exists on the tape, the user will be 
notified by a message. He then has the option of writing over 
the old file, specifying a new tape and file, or stopping 
execution. The second control card specifies that a new tape 
be used, and the 'INITIALIZE' parameter requests that the proper 
header information be placed at the beginning of the new tape 
before a new file is written. A new tape must always be initia- 
lized before it can be used to store intermediate results . The 
NEWRUN parameter specifies a unique eight digit number to be 
placed in the run slot on the file ID record. In addition, 
point-by-point means of annexed fields (or original data values 
if the cell was singular) and an array which gives a field 
number for each cell are contained on the Intermediate Results 
File. The nonsupervised Intermediate Results File contains 
statistics for each of the homogeneous fields identified. 

These statistics are used in NS2ECH0 to sample classify the 
fields. 

Standard Printer Output : The NS1ECH0 function always prints 

a summary of the user's input deck. The summary includes a 
reproduction of the input deck, and a set of parameters selected. 
This set of information includes the cell width, the number of 
channels, and the annexation and homogeneity parameters. Figure 
10 shows an example of this output for the control cards appearing 
in Figure 9 . 

Optional Printer Output; The Field Map is an optional printer 


output which may be selected by means of the 'PRINT' card. 
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Figure 10 

Standard Printer Output for NS1ECHO 
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This output is a map showing how cells were annexed into indi- 
vidual fields. Each field which the nonsupervised ECHO pro- 
cessor identifies is arbitrarily assigned a symbol. Singular 
(nonhomogeneous) cells are assigned blanks. Figure 11 shows 
an example Field Map which was generated by adding a 'PRINT 
MAP* control card to the control card deck listed in Figure 9. 
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NONSUPERVI SEP ECHO: CLASSIFICATION - PHASE 2 

(NS2ECHO) 

The NS2ECHO function is an implementation of the classi- 
fication phase of the nonsupervised Extraction and Classification 
of Homogeneous Objects (ECHO) algorithm. It performs maximum 
likelihood sample classification of objects that were identified 
during the nonsupervised field extraction phase (NS1ECH0) and a 
point-by-point maximum likelihood classification of the constituent 
points of cells which were labeled singular by the NSlECHO function 
After performing the classification, it writes the results on 
a Classification Results File to be printed later. 

Input to the function: 

’ An Intermediate Results Tape containing statistics and 
portions of the fields identified by NS1ECH0 and the 
data vectors of pixels from singular (non-homogeneous) 
cells. 

* Control cards to select the processing options. 

A Statistics File containing the statistical description 
of the training classes. 

The principal output is a LARSYS Classification Results File, which 
is placed on tape. This file is normally used as input to the 
LARSYS PRIETRESULTS function for production of a variety of 
printed map and tabular outputs for display of results and 
evaluation of the classification. The Classification Results 
File is also the primary input to the LARSYS COPYRESULTS , 
LISTRESULTS, and PUNCHSTATISTICS functions. 
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Inputs 

The principal inputs to the nonsupervised ECHO classification 
function (NS2ECH0) are the nonsupervised Intermediate Results Pile 
which has been produced by the nonsupervised ECHO field 
extraction algorithm and the LARSYS Statistics Pile. The 
Statistics Pile must be included as card deck input to this 
function. In addition to the Statistics Pile, the user must 
provide an input deck designating the location of the Inter- 
mediate Results Pile and the desired destination of the Class- 
ification Results. An example of the use of the control cards 
and the correct location for the LARSYS Statistics Pile is 
shown in Figure 12. 

Specification of Intermediate Results Location : The user must 

specify the tape and file containing the Intermediate Results. 

This is done by means of an INTERMEDIATE card. The form of 
this card is: 

INTERMEDIATE TAPE (XXX), PILE ( YY) 
where XXX is the number of an Intermediate Results Tape and YY 
is the file containing the desired results. Note: Only 

Intermediate Results File produced by the nonsupervised ECHO 
field extraction algorithm (NS1ECH0) may be used by NS2ECH0. 

Optional Selection of Training Classes: The user may select 

the training classes from the Statistics Pile that are to be 
used by nonsupervised ECHO'S classification phase (phase 2), and 
he may combine training classes into pools. These options are 
exercised by using the 'CLASSES' control card. For example, 
if the user wished to use only classes 1, 3, and 5 of seven 
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Figure 12 

Example Control Cards for the Nonsupervised ECHO 
Classification Phase 
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training classes previously defined/ the control card entry would 
be: 

CLASSES 1, 3/ 5 

In this case, the class name assigned by the statistics func- 
tion at classes 1, 3, and 5 will be retained by SECHO and the 
other classes will be totally ignored. 

To combine two or more classes into one class, the user 
assigns a name (up to eight characters) to the pooled class to 
be created and specifies the classes to be included in the 
pooled class. For example, assume there are eight classes 
available in the training statistics, and the user wishes to 
process the following combinations: 

• POOLA (Pool A) will be classes 1 and 2. 

• POOLB will be classes 4, 6, and 7. 

• POOLC will be class 5 only. 

• Classes 3 and 8 will be ignored. 

The control card format to specify this option will be: 

CLASSES POOLA (1/1, 2/) ,POOLB ( 2/4 , 6 , 7/) , POOLC (3/5/) 

Note that the number immediately following a left parenthesis 
specifies the pool sequence. Pool sequence numbers must be in 
ascending order. Note also that the classes to be pooled (and 
named) are enclosed by slashes (/) . 

When no 'CLASSES' card is specified, all the classes in 
the statistics deck will be considered by the nonsupervised 
ECHO processor. 

Specification of Class Statistics: Class statistics must be 


supplied to the nonsupervised ECHO classification phase before 
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classification may proceed. Unlike the supervised ECHO 
classifier and the LARSYS CLASS IFYPOINTS algorithm, the 
Statistics File must be provided to the nonsupervised ECHO 
processor in the control card file. The LARSYS Statistics File 
is inserted into the input deck immediately before the 'END' 
card. The Statistics file must be preceeded by a 'DATA' card 
(see Figure 12) . 

Outputs 

Classification Results File : The principal output of the NS2ECH0 

function is the Classification Results File, which is, in turn, 
the primary input to four other LARSYS functions: PRINTRESULTS , 

COPYRESULTS, LISTRESULTS, and PUNCHSTATISTICS . The user must 
specify where this file will be stored by using a 'RESULTS' control 
card in one of two forms : 

RESULTS TAPE (xxx) , FILE (nn) 

RESULTS INITIALIZE, TAPE(xxx) 

The first control card is used to add the file to a tape already 
containing Classification Results Files. If a file in the 
specified destination already exists on the tape, the user will 
be notified by a message. He then has the option of writing 
over the old file, specifying a new tape and file number, or 
stopping execution. The second control card specifies that a 
new results tape be used, and the 'INITIALIZE' parameter requests 
that the proper header information be placed at the beginning 
of the new tape so a file may be written. A new tape must 
always be initialized before it can be used to store classifi- 
cation results. 
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A unique "Classif ication Study Number” , based on the date 
and time of the run, is part of each Classification Results 
File. The number, identified as "Classification Study", is 
included on any outputs that are subsequently derived from the 
results file. The form of the identification number is "ydddsssss" ; 
where y is the last digit of the year, ddd is the Julian date 
(day of the year, 001-365) , and sssss is the total number of 
seconds since the previous midnight. 

The principal data on the file are the class assignments 
for each point of the classification run. A separate record is 
written for each data line classified. This record contains, 
for each point in the line, the class number associated with 
the class to which the point was assigned. These classification 
results are used by the PRINTRESULTS function to produce de- 
tailed maps of the classified area as well as tables of the test 
fields, training fields, and class performance. For more infor- 
mation on these products, refer to the description of the 
Printresults function in the LARSYS User's Manual pi]. 

In addition to the point-by-point classification results, 
the file contains other data related to the classification run: 

* A complete copy of the Statistics File that was used as 
input to the run. This file may be punched on cards by 
using the LARSYS Punchstatistics function. 

* Summary information about the classification, the channels 
and classes which were u >ed. A formatted listing of this 
information may be produced fay using the Listresults 
function. This listing is also a secondary product of 
both the Punchstatistics and the Copyresults function. 
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' Results statistics (mean vectors and covariance matrices) 
for the classes and channels used in the classification. 
Standard Printer Output ; Figure 13 presents the standard printer 
output produced by the classification phase of the nonsupervised 
ECHO processor. The nonsupervised ECHO classification phase 
{phase 2) has only two printer outputs, a reproduction of the 
user’s control card deck and a summary of the particular charac- 
teristics of the classification, the Classification Study Num- 
ber, the number of pooled classes, the number of channels, the 
number of fields and the channels selected. 


Figure 13 

Standard Printer Output for the Classification Phase of Nonsupervised ECHO (NS2ECH0) 
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A DISCUSSION OF THE ECHO ALGORITHMS 


The following material assumes that the reader is already 
aware of the general nature of the ECHO process, including data 
and parameter inputs required and the outputs produced by the 
programs which are discussed. 

Background 

As we have noted, the ECHO process consists of two phases: 
object finding and sample classification. Furthermore, there 
are both "supervised" and "nonsupervised" versions of the pro- 
cess, the principal difference in the two versions being deter- 
mined by whether or not a set of precalculated class statistics 
is used in the object-finding phase. The purpose of this section 
is to outline the mathematical basis for the supervised ECHO 
process and to describe its implementation in the form of an 
algorithm compatible with LARSYS-like data analysis. In a later 
section we shall do the same for the unsupervised ECHO process. 

In all that follows, it is implicitly assumed that the class- 
conditional density functions are multivariate normal; i.e., for 
the ith class and for pixel vector X, the n-variate probability 
density function can be written as: 


p(x|o» ± ) = 


( 2tt) n/2 I K ± | 35 


exp [-%<X - M. ) *TC . -1 (X - M.)] 

Xl 1 


where 

is the covariance matrix for class 
is the mean vector for class w- 
n is the dimensionality of the data (pixel vector X) . 
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In general, the covariance matrices and mean vectors will 
be estimates from collections of pixels assumed to belong to a 
given class. 

It will also be assumed that the data from adjacent or 
nearby pixels are class-conditionally independent. This will 
allow the joint probability density function for a collection 
of such pixels, all assumed to belong to the same class, to be 
written in product form: 

p(x[aj) = p(X 1 |(O i )p(X 2 |ok ) P (X s | tOi ) 

12 s 

where X = {X^, X^, . . . , X g } is such a collection (sample) consisting 

of s pixels belonging respectively to classes w. , to. to. . 

1 1 x 2 x s 

Supervised Object Finding (SECHO) : The object-finding process 

is in itself a two-phase process. In the first phase, referred 
to as "cell selection", the scene is partitioned into a rectan- 
gular grid of small groups of pixels, called "cells". As im- 
plemented in ECHO, each group or cell is a sqaure with N pixels 
on a side (N is an input to the program) . To remain a cell, the 
group must satisfy a statistical homogeneity criterion, described 
in more detail below. A cell failing to satisfy this criterion 
is called "singular", and its pixels will be classified indivi- 
dually. 

The supervised cell selection homogeneity test used in ECHO 
is performed as follows. Define the quantity 

s t -1 

Q.(Y) = E (Y.-M.) K. (Y.-M.) 

3 - i= i i 3 3 13 

= tr (KT 1 E Y.Y , fc ) - 2 M^k7 1 E Y. + smJk^M. 

3 i=l 1 x 3 3 i—i 1 333 


where 


Y. is the ith pixel vector in the cell being tested 

2 

s is the number of pixels in the cell (s - N ) 

Kj is the sample covariance matrix for the jth training 
class 


Mj is the sample mean vector for the jth training class. 
This quadratic form is a measure of the statistical distance 
of the collection of data contained in the cell from the dis- 
tribution of the training data for the jth class. Now let w* 
be the class for which the "log-likelihood" of the cell is 
maximum; i.e., In p(Y|w ) = max In p(Y|ti)j) = mjax[- 7 £ ln[2irKj|- 

and let Q (Y) be the value of the corresponding quadratic form. 

A cell is defined to be singular (and its pixels will be classi- 
fied individually) if Q (Y) > c, where c is a user-specified 

threshold value. Otherwise, we accept the hypothesis that the 
cell Y is homogeneous and treat it as a unit. 

This criterion has the particular advantage that it tends 
to "reject" not only inhomogeneous cells, but "unrecognizable" 
cells as well (cells very unlikely to belong to any of the 
training classes). Another advantage is that the computations 
involved are particularly compatible with the supervised annexa- 
tion criterion and the maximum likelihood sample classifier. 

Also of importance, the distribution of the values can 
be shown to be chi-squared with s*n degrees of freedom. This 
fact is used in determining appropriate values of the threshold 
parameter c. 

In the second object-finding phase, called "annexation", a 
cell is compared to an adjacent "field", which is simply a group 
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of one or more spatially connected cells which have already been 
merged. If the two samples are statistically similar, according 
to a test we shall detail below, then the cell is merged or 
"annexed” into the field. Otherwise the cell is compared to 
another adjacent field, if one exists, or it becomes a new 
field by itself. 

In ECHO, the supervised annexation similarity test is based 
on the statistic 

max p(X| Mi > p(Y| Ui ) ® MJW 

^ - max p (X | u). ) max p(Y|oj.) 
i j ] 

where X is the collection of pixels forming the field and Y is 
the collection of pixels in the cell. Notice that A has a value 
between 0 and 1. It is closest to 1 when both p(X|w.) and 
p(x|uk) have their maximum value for the same class. 

Thus the annexation criterion may be stated as follows: 

The cell is assumed to belong to the same class as the field and 
is annexed to the field if A > T, where T is a threshold value 
(0 < T < 1) . Otherwise the cell is considered significantly 
different from the field and no annexation takes place. 

For purposes of computational efficiency it is preferable 
to work with the logarithm of A. This not only converts the 
statistic into a difference of sums (rather than a quotient of 
products) but also simplifies computation of the p(x[tu^), etc., 
under the multivariate normal assumption noted earlier. We 
restate the annexation criterion as: Assume the cell belongs 

to the same class as the field and annex the two provided -log 
A < t, where t is a user-specified threshold value (t > 0) . 

Note that t can be related to the parameter T by the expression 
T = lo“ fc . 


'I 
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Maximum Likelihood Sample Classification (SECHO and NS2ECH0) : 
Regardless of whether the object-finding method used was super- 
vised or nonsupervised, the resulting objects are always 
classified by a supervised classification rule. In other words, 
training class statistics must be provided for use by the classi- 
fication rule. 

Therefore, let K. and M. be the covariance matrix and mean 

3 3 

vector ,■ respectively , for the jth training class (estimated from 
training data) , and let s be the number of pixels in an object 
to be classified. The maximum likelihood sample classification 
rule is: 

Decide X = {X^, X 2 ,...,X S ) belongs to class to* if and only 
if p (X | to*) = max p (Xjto.) 

j 3 

or equivalently 

In p(x|to*) = max In p(Xjto.) 

j 3 

Under the assumption noted earlier of class -conditional inde- 
pendence of pixels within an object, we have 
p(x|co_.) = pUJok) p{X 2 | Wj ) '*• P( x s l“j> 
or 

s 

In p(x|to.) = 2 In p(X. (to.) 

J k=l K J 

Taking into account the multivariate normal assumption, this 
becomes, after some manipulation: 

In p(Xlto.) = - f In | 2 ttK . | - . (X) 

■ - \ + m!k- 1 S 1 - | mJk-V - | ln|2„K.| 
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where 

s s t 

S, = I X. and S„ = S X.X^ 

1 i=l 1 1 i=l 1 1 

the sums taken over all pixels in the object to be classified. 
Notice that is a vector and is a matrix. 

Expressed in this way, two terms in the "log-likelihood" 
depend on the data to be classified and the training statistics, 
whereas the third and fourth terms depend only on the training 
statistics. Thus the latter two terms need to be evaluated 
once, whereas the first two terms need to be re-evaluated for 
each data point to be classified. 

The expression above for the log-likelihood is perfectly 
valid for the case s = 1. It provides the computation necessary 
for classifying the individual pixels resulting from cells which 
fail to pass the cell selection homogeneity test. 

Nonsupervised Object Finding (NS1ECH0) : It was noted in the 

discussion of the supervised ECHO algorithms that the only 
difference between the supervised and nonsupervised ECHO pro- 
cesses is in the approach used for object finding. The super- 
vised ECHO process utilizes predetermined class statistics 
in partitioning the image data into objects. The nonsupervised 
process must accomplish the partitioning without benefit of pre- 
determined class statistics.* Both processes utilize the same 
maximum likelihood sample classification algorithm. 


*Since the supervised object-finding process uses more a 
priori knowledge about the data, it might be expected that it 
would”perform somewhat more reliably than, the nonsupervised 
version. In fact, this has been demonstrated experimentally 
[Kcttig, R. L. and D. A. Bandgrebe, "Classification of Multi- 
spectral Image Data by Extraction and Classification of Homo- 
geneous Objects," IEEE Trans. Geoscience Electronics, vol. GE-14 
no. 1, January 19 76J . 
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The nonsupervised object-finding process, like the super- 
vised version, is a two-phase process involving "cell selection" 
and "annexation". In the cell selection phase, the scene is 
partitioned into a rectangular grid of small groups of pixels, 
called "cells". Each cell is a square with N pixels on a 
side (the cell width, N, is an input to the program) . To 
remain a cell, the group must satisfy a statistical homogeneity 
criterion (described below) . A cell failing to satisfy this 
criterion is called "singular", and its pixels will be classi- 
fied individually. 

The nonsupervised cell selection homogeneity test used in 
ECHO is quite simple. The sample variance of the data in each 
channel divided by the corresponding channel mean is compared 
to a user-specified threshold which is an input to the program. 
If the threshold is exceeded in any channel, the cell is consi- 
dered singular and its pixels dealt with accordingly, i.e., 
classified individually. Although more powerful statistical 
tests have been investigated for cell selection purposes, none 
have been found more effective than the one described here. 
Furthermore, the more powerful tests often impose undesirable 
requirements on the minimum usable cell size. 

In the annexation phase of the nonsupervised object-finding 
process, a cell is compared to an adjacent "field", which is 
simply a group of one or more spatially connected cells which 
have already been merged. If the two samples are statistically 
similar, according to a test described below, then the cell is 
merged or annexed into the field. Otherwise, the cell is 
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compared to another adjacent field, if one exists, or it becomes 
a new field by itself. 

The test implemented for annexation in ECHO is a "multi- 
univariate" test rather than a truly multivariate test. That 
is, the test is based on examining sequentially the statistics 
associated with each data channel rather than examining the 
multivariate statistics for all channels combined. Extensive 
testing has shown that this approach is best when the cell size 
is small, because the number of pixels in the cell may not be 
sufficient to provide a good estimate of the multivariate sta- 
tistics (particularly the cell covariance matrix) . 

In this case, the means ar d the variances are tested inde- 
pendently. First the cell and field means are tested for 
similarity based on the statistic 

(T - 2) rs (x. - y . ) 2 

A = 11 

A li 

T a^ ,i=l, 2, ...,n 

where 


x^ is the field mean in channel i 
y^ is the cell mean in channel i 
r is the number of pixels in the field 
s is the number of pixels in the cell 
T = r + s 


IT 2 

a. = E (x. . - x.) 2 + l (y. , - y.) 2 . 
1 j»i *3 1 j=l ^ 1 


Under the hypothesis that field and cell have the same distri- 
bution, this statistic has an F distribution with 1 and (T-2) 
degrees of freedom. Large values of indicate that the 

hypothesis is not true. The field and cell will not be merged 
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if any component of the means fails to pass this test at a 
level of significance defined by a user-supplied threshold 
constant. 

If the means pass the similarity test, then the channel 
variances are tested. The cell and field variances are tested 
for similarity based on the statistic 



u-9 + 1 g 2 )s 1 

1 ' (1-9+1 9 2 >? °i 


where 


= ±(_i- + 

3 k r-l 


1 

s— 1 



i—l, 2 , . . n 


r, s, T are as defined above 


and 


G. = log { 
i 


T-2 


(T-2) 


xi 


r-1 


r-1 


a 


s— 1 


(s-1) 


.r 1 ) 


where 


_ o 

a . = E (x. . - x. ) 
xi j=1 V 1] 


a . =2 (y.. - y.) 2 
yx j=1 *13 *i' 


a. = a . + a . 
i xi yi 

Under the hypothesis that the field and cell have the same 
distribution, A 2 ^ has an F distribution with 1 and (3, degrees 
of freedom. The field anc. cell will not be merged if the data 
in any of the channels fails to pass this test at a level of 
significance defined by a user-supplied threshold constant. 


Maximum Likelihood Sample Classification (NS2ECH0) : The objects 

defined by the nonsupervised objected-finding process may be 
subsequently classified by a sample classification rule. This 
is a logical step to perform only if it is done by a supervised 
sample classifier, however, and we have already noted that the 
supervised classifier used is the same as that used following 
supervised object finding. 
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ECHO PARAMETER SELECTION GUIDELINES 


This section discusses settings of the object extraction 
parameters required by the supervised ECHO (SECHO) and non- 
supervised ECHO (NS2ECH0) processors. These comments have their 
origin in the test and evaluation of the ECHO processors per- 
formed between June 1976 and August 1977 and reported in the 
LARS Final Technical Report to JSC in May 1977 [5] and the LARS 
Final Technical Report to JSC in November 19 77 [6] . 

The results of Landsat and simulated Thematic Mapper data 
are discussed. The Landsat data were drawn from two sources , 
LACIE/SRS data sets collected over Kansas where the principal 
information classes (wheat and other) are in relatively large 
fields and CITARS data sets collected over Indiana and Illinois 
where the principal information classes (corn, soybeans, and 
other) occur in relatively small fields. 

The simulated Thematic Mapper data collected over Kansas 
and North Dakota has relatively large fields and is simulated 
at 30, 40, 50 and 60 meter resolutions. 

Six variables were monitored to evaluate the ECHO algorithms 

•’ CPU time, 

Field center pixel classification performance, 

* Training field classification performance, 

' Full field classification performance 

RMS proportion estimate error, and 

* Classification variability. 
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These variables are related to reasons for adopting a new 
classification technique: cost, accuracy, and usability of 

results. The CPU time required to perform a classification is 
one way to measure the cost of classification. Field center 
pixel, full field, and training field performances and RMS 
proportion estimate error are all ways to evaluate the accuracy 
of the classifier. Classification variability is a measure of 
"salt and pepper effect" in classification results. 

The CPU time required to execute each of the ECHO classifi- 
cations has bee recorded so that the effects of varying the 
cell homogeneity and annexation thresholds may be monitored. 

The CPU time required to perform the perpoint classifications 
have been adjusted to reflect the increased efficiency of the 
LARSYS perpoint classifier which is coded in assembly language. 
Thus, the CPU time recorded for a perpoint classification is 
what a FORTRAN classifier would have required to perform the 
classification. 

The indices of classification performance were applied in 
several ways. Classification accuracy (identification) was 
evaluated utilizing field center pixel, "full field" and test 
field sample performances for all data sets. Proportion esti- 
mation was carried out for the Landsat and Simulated Thematic 
Mapper data sets. 

The training performance is the overall classification 
accuracy (number of training pixels correctly classified divided 
by the total number of training pixels) of the pixels used to 
calculate the class statistics. Field center pixel performance 
is the overall classification accuracy. of pixels inset at 
least one pixel from the field boundary. For the registered 
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LACIE/SRS data the field center pixels are inset at least two 
pixels from the field boundary. Although this procedure insures 
that the pixels examined are not mixture pixels, it has the 
unfortunate effect of eliminating smaller fields from considera- 
tion. The third measure of classification accuracy, "full 
field" performance, includes those pixels on the boundaries 
of the fields in the classification performance. The "full 
field" pixels were generated by expanding the field center 
pixel boundaries one pixel in all directions. 

The RMS error of informational class proportion estimates 
for each flightline was found by calculating the percent of the 
flightline classified as a particular class and comparing it with 
the ground-collected estimage using equation (1) . 


RMS Error 





N 


( 1 ) 


where, N = number of informational classes, 

percent classified as informational class i, and 
= percent of class i estimated from ground-collected 
data. 

RMS error is calculated for the Landsat and Thematic Mapper 
data runs. The Agricultural Stabilization and Conservation 
Service (ASCS) provided the ground truth proportion estimates 
for the simulated Thematic Mapper data set. Proportion estimates 
for the 1974 LACIE/SRS segments were provided in ground truth 
packets received from JSc . The SRS county proportion estimates 
were used to calculate RMS proportion error for the CITARS 
data set. 
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Average variability is a measure of the rate of change 
from one information class to another. It should reflect the 
degree to which ECHO reduces the "salt and pepper effect" 
which is sometimes present in perpoint classifications. Varia- 
bility is calculated by systematically selecting 50 lines of 
the classified area, counting the number of information class 
changes, and dividing by the number of opportunities for class 
changes . 

Variability - NCC/ (50* (NS-1) ) (2) 

Where : 

NCC = the number of class changes over the 50 selected 
lines, and 

NS = the number of classified pixels/lines. 

Supervised ECHO Parameters 

Landsat Parameter Selection for the - Supervised ECHO Processor : 
The LAC IE and the CITARS data sets appear to have different 
requirements in parameter settings. This is very reasonable as 
the two data sets have a very different range of average field 
sizes and different ground cover types. The average field size 
in the CITARS data sets range from 17 in Shelby to 23 in 
Livingston; the average field size in the LAC IE data sets range 
from 78 in Haskell to 91 in Graham. The LACIE data sets are 
composed of classes of wheat and other while CITARS data sets 
are corn , soybeans , and other . 

The cell width setting which optimizes the field center 
pixel and full field performances varies over the data sets with 
cell width 2 most frequently providing the optimal results. 

There appears to be a slight tendency toward larger values of 
cell width showing superior performance at smaller values of 


average field size which is not consistent with our expectations 
and difficult to justify theorically. The training performance, 
however, is consistently optimum at a cell width of 2. The 
proportion estimate error follows a different pattern for the 
CITARS data sets than for the LACIE data sets . For the CITARS 

data sets a cell width of 4 is best when the number of spectral 

classes is less than 10? when the number of spectral classes is 
greater than or equal to 10, a cell width of 2 is better. The 
opposite pattern holds for the LACIE data sets. A cell width 
of 2 is best when the number of spectral classes is less than 

10; and a value of 4 or 5 is better when the number of spectral 

classes is greater than or equal to 10 . For both CPU time and 
classification variability, cell width settings of 4 for the 
CITARS data sets and from 2 to 4 for the LACIE data sets will 
give optimal results. 

The optimal cell homogeneity settings are rather scattered 
and inconsistent for field center pixel, full field, and training 
performances as well as proportion estimate error. There appears 
to be a slight tendency toward larger values of the cell homo- 
geneity parameter optimizing field center pixel, full field and 
training field performances as the average field size increases. 
For field center pixel performance, no one value consistently 
yields superior results for the CITARS; however, a homogeneity 
setting 79 is most often optimum for the LACIE data sets. For 
full field performance values around 15 and around 118 appear 
often as the optimal cell homogeneity setting for the CITARS 
data sets; for the LACIE data sets, homogeneity settings around 
40 and 80 often give optimal values. For training performance, 
homogeneity settings between 60 and 120 appear equally often as 
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the optimal performance settings; a more narrow recommendation 
is difficult to make. A recommendation of cell homogeneity 
setting is less difficult to make to optimize CPD time required 
or classification variability produced, as a setting of 120 or 
more always minimized both. 

The optimal cell annexation parameter settings are some- 
what inconsistent for field center pixel performance. There is 
a slight tendency for larger annexation values (2-4) to yield 
improved field center pixel performances for runs having large 
average field sizes (above 60 pixels) . The CITARS data sets 
have optimal field center pixel performances with settings of 
0 or 1 while the LAC IE data sets, with large average field 
sizes, have optimal performances for annexation of 2 or 4. 
Similarly, for full field performance, the CITARS data sets 
generally perform best with an annexation setting of 1 and 
LAC IE data sets perform best with an annexation setting of 2. 
For training performance, a setting of 2 gives the optimum for 
most Landsat data sets. Both proportion estimate error and 
classification variability are minimized with an annexation 
setting of 4; while CPU time is lowest with annexation settings 


of 1 or 4. 
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Dependent 

Variables 


Field Center Pixel 
Performance 

Full Field 
Performance 

Training Field 
Performance 

Proportion Estimate 
Error 


Classification 

Variability 

CPU Time 


AFS = Average Field 


Figure 14 

Supervised ECHO Landsat Parameter Settings 
to Optimize Six Variables 


Cell Width 
Setting 

2 


AFS<40, 3 

AFS>40, 2 

2 


SP<10 SP>10 

AFS<40 4 2 

AFS>40 2 4-5 

Largest Possible 

2-4 (higher as AFS 
increases) 


Homogeneity Parameter 
Setting 

20-80 (higher as AFS 
increases) 


20-80 (higher as AFS 
increases) 

Oo 


CX3 


AFS<40, 15-30 or 100-130 
AFS>40, 35-130 

80-120 


Size (AFS<40, CITARS data; 
AFS>40, LACIE data) 


Annexation 
Threshold Setting 


AFS<40, 0 to 2 
AFS>40 , 2 to 4 

AFS<40, 1 

AFS >40, 2 

2 

4 

C«Q 

oo 


SP = Spectral Classes 
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Thematic Mapper Parameter Selection for the Supervised ECHO 
Processor : The results are fairly consistent except at the 50 

meter resolution. For the supervised ECHO processor, a cell 
width of 2 is best for field center pixel performance and training 
field performance. This choice is also best for full field per- 
formance, CPU time, and variability except at the 50 meter 
resolution. The root mean square error is minimized when the 
cell width is the integer part of the square root of the average 
field size. 

The six settings of cell homogeneity which have been tested 
for the Supervised ECHO on simulated Thematic Mapper data are 
to 19, 32, 45, 68, 91 and 136. Field center pixel performance 
is highest at settings between 68 and 91, except at the 50 meter 
resolution where a setting between 19 and 32 does better. Simi- 
larly, for full field performance, the best cell homogeneity 
setting is around 68, except at the 50 meter resolution where 
values between 19 and 32 are better. Training field performance 
is best when the homogeneity parameter is set around 19 when 
the average field size is less than 75 pixels; otherwise homo- 
geneity values between 32 and 45 yield higher training field 
performances. The cell homogeneity setting is less important 
in optimizing proportion estimates; when the cell width is 
the integer part of the square root of the average field size. 

All cell homogeneity settings between 32 and 91 produced very 
similar results. For both variability and CPU time, the highest 
homogeneity setting tested (136) yielded the optimal results. 

For all the measures except proportion estimate error, a 
cell annexation setting of 4 yielded superior results. With 
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respect to proportion estimation error, all settings produced 
similar results. 
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Figure 15 

Supervised ECHO Simulated Thematic Mapper Parameter 
Settings to Optimize Six Variables 

Dependent jCell Width | Homogeneity I Annexation 

Variable } Setting | Parameter Setting | Threshold 


Field Center Pixel 

2 

60-95** 

Performance 



Full Field 

2* 

60-70** 

Performance 

\ 


Training Field 
Performance 

2 

AFS <75 15-25 

AFS>75 30-50 

Proportion Estimate 

4-6 ( AFS 

>30 

Error 



Classification 

2-4 (Larger as 

Oo 

Variability 

AFS increases) 

CPU Time 

2-4 (Larger as 


jAFS increases) 
* Except at resolution 50 

OO 

**Except at resolution 50 where 20-40 
AFS = Average Field Size 

xs optimal 
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Uonsupervised ECHO Parameters 

Nonsupervised BCHO Landsat Parameter Selection : Only a cell width 

of 2 was used on the nonsupervised ECHO data sets. The recom- 
mendations are thus made only on cell homogeneity and cell 
annexation parameter settings. 

The optimal cell homogeneity settings are not very consis- 
tent for field center pixel, full field and training performances, 
where the optimum tends to alternate between 0.05 and 0.25. For 
proportion estimate error, a setting of 0.05 is best for CITARS 
data sets while a setting of 0.10 is best for the LACIE data 
sets. For variability and CPU time, a setting of 0.25 is the 
optimum for almost all data sets. Cell annexation settings of 
0.0 10 give optimal results for field center pixel, full field, 
and training performances and for proportion estimate error. 

A cell annexation setting of 0.001 yield classification results 
with the lowest classification variability ("salt and pepper" 
effect) and requires the least CPU time to execute for a given 
area. 
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Figure 16 



Nonsupervised ECHO Landsat Parameter 

Settings 


to Optimize Six Variables 


Dependent 

j Cell Width j Homogeneity Parameter (Annexation Parameter 

Variable 

Setting Setting 

I Setting 


Field Center Pixel 
Performance 

Only a ce!3. 
width of 2 
was tested 

.05 - .25 

. 005 , .01, . 

Full Field 
Performance 


.05 - .25 

. 005, .01, . 

Training Field 
Performance 


.05 - .10 

.005, .01, . 

Proportion Estimate 
Error 


.05 - .10 
(larger as AFS 
increases) 

* 005 , .01, . 

Classification 

Variability 



.001 

CPU Time 


C^O 

.001 

AFS = Average Field 

Size 
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Nonsupervised ECHO Thematic Mapper Parameter Selection : Many 

of the parameter settings for the nonsupervised ECHO algorithm 
appear to be related to the number of spectral classes. For 
both field center pixel and full field performance, a cell 
width of 2 is better where the number of spectral classes is 
less than 30, a cell width of 3 is better when the number of 
spectral classes is greater than 30. For training performance, 
the same pattern holds except that the dividing value is 20 
spectral classes. The reverse pattern appears for variability, 
with the cell width of 2 minimizing variability in the classi- 
fication results if the number of spectral classes is greater 
than 30, and cell width 3 minimizing classification variability 
when the number of spectral classes is less than 30. A cell 
width parameter setting of 3 minimizes the proportion estimate 
error or the CPU time required. 

The optimal cell homogeneity parameter settings also appear 
to be related to the number of spectral classes. For both field 

center pixel and full field performance, a cell homogeneity 

* 

setting of 0.05 is best when the number of spectral classes is 
greater than 30 while a value of 0.10 is better when the number 
of spectral classes is less than 30 . For both training perfor- 
mance and root mean square error, a cell homogeneity setting of 
0.05 gives optimal results while both CPU time and the variability 
of the classification are minimized with a cell homogeneity 
parameter of 0.25. 

A cell annexation parameter setting around 0 . 10 yields the 
optimal field center pixel performance, while values of between 
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0.010 and 0.100 optimize full field performance and minimize 
proportion estimation error. Training performance, CPU time 
required, and variability present are all optimized by a cell 
annexation setting of 0.001. 
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Dependent 

Variable 


Figure 17 

Nonsupervised ECHO Simulated Thematic Mapper 
Parameter Setting to Optimize Six Variables 


Cell Width I Homogeneity Parameter I Annexation 


Setting 


Field Center Pixel |SP<30, 


Performance 

Full Field 
Performance 

Training Field 
Performance 

Proportion Estimate 
Error 

Classification 

Variability 

CPU Time 


SP>30 , 

SP<30, 

SP>30, 

SP<20 , 
SP>20, 


SP<30, 3 

SP>30, 2 

As large 
as possible 


Setting 


oo 

oO 


Threshold 


SP<30, 
SP>30 i 

.08- 
.0 3- 

. 12 
.07 

.005, 

.01, 

• 

o 

ro 

SP<30 , 
SP>30 , 

.08- 
.0 3- 

. 12 
.07 

.01, 

.025, 

,.05 

.03-. 07 



.001, 

.025 


. 03-.07 



.01, 

.025, 

.05 


SP = Spectral Classes 
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